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[57] ABSTRACT 

In a system that accepts a given search entity from a user and 
utilizes a database to identify a possible matching entity 
from a large list of entries, a method is provided for 
evaluating the reliability of the matching entity. Preferably, 
the method is carried out with minimal human intervention. 
A user inputs a plurality of attributes to identify a given 
entity, the system identifies a possible matching entity, and 
assigns a numerical grade to reflect the match quality of each 
attribute. Thereafter, the method assigns a grade to each 
attribute score, assembles the grades into a key, uses the key 
to address a memory, and retrieves a confidence code or 
quality indicator from the memory. The confidence codes are 
based on empirical information and reflect the overall qual- 
ity of the match for the particular entity. 

15 Claims, 6 Drawing Sheets 



z 



Entity Name 


95 


Street Number 


92 


Street Name 


87 


PO Box Number 


94 


Citv and/or Postal Code 


47 


State (US) or Country 


100 


Telephone Number 


Null 




L 



































/ 


X 


s 

























93.3 



-Match 
Confidence 



-Match 
Accuracy 



02/03/2003, EAST version: 1.03.0002 



U.S. Patent 



Aug. 19, 1997 



Sheet 1 of 6 



5,659,731 




~ -C 
C W Q 
<D W "55 

2£| 



I 



c 

'55 
to 



c S o 

CD n O 

5 c3 S x o 

CO *5 O © «• 



CO _ 
CLCO 



Z-JCL 



c 



CD 

E 

3 



CD 

E 
co 
Z 

CD 
2 

55 



I 



CD 
JO 

E 
z 

8 

m 
O 

Q- 



O 
T3 
O 

O 

CO 
CO 

o 

QL 

W» 

c 

CO 

£1 

O 



c 

3 
O 

O 



§ 

55 



CD 

E 

3 



CO* 



CL 

o 

CD 



a 



02/03/2003, EAST Version: 1.03.0002 



ILS. Patent Aug. 19, 1997 Sheet 2 of 6 



CO 

CM* 






\ 






■ T" 

/ 








\ 


— t 

/ 










> 


< 












\ 














\ 





< 


< 


CD 


< 


Li. 


< 


N 





o 

CM 



c 

LU 



SI 

E 
3 



<D 

2 

CO 



o 

CD 

CO 



o 

JO 

E 
3 
Z 

x 
o 

CD 

O 

0. 



<D 

-a 
o 
O 



CO 

o 

CL 

1 

c 

CO 

o 



c 
o 

o 



2 

CO 



CD 

£ 
3 
z 

*H O 
CO 



CL 

.2 



02/03/2003, EAST version: 1.03.0002 



U.S. Patent 



Aug. 19, 1997 



Sheet 3 of 6 



5,659,731 



8 

CO 



Z. as 

|§ 
si 



c 

LU 
C 

s 



c 

til 
© 

.Q 
CO 

Q 



O) 
0> 



£ 

o 
ffl 
< 



c 
•c 

o 
co 

3 
C 
CO 

2 

O 

< 



O 

O 

O 
u. 

2 

o 
m 
< 



o 



5 
2 

o 

CD 
< 



c 

£ 
o 
O 

2 

o 

00 

< 



2 

<5 

O) 



o 
< 



o 



Si 

X 



CO 

c 
•c 

i 

C 

co 
2 

N 
> 



0 



CD 
O) 



O 
CQ 
< 



2 

2 

0 

o> 



O 

CQ 
< 



CO 

d 

E 



02/03/2003, EAST Version: 1.03.0002 



U.S. Patent 



Aug. 19, 1997 Sheet 4 of 6 



5,659,731 




02/03/2003, EAST version: 1.03.0002 



U.S. Patent 



Aug. 19, 1997 



Sheet 5 of 6 



5,659,731 



( Start J 



Input 
"Given anffiy" 



vw- 50 



Retrieve Qstfng of possible matching entitles 



Assign scores for each attribute 
Assign grade for each score 
Form grade key 



Retrieve confidence code and accuracy 
percentage from look-up table 



Select matching entity and 
Assess risk value 



r 



SMALL 



J Is risk value small, \ LARGE 
\ medium, or large? i 



MEDIUM 



N ^ Is confidence^ / is confidence code \ WO No/ 

~"\ greater than "X"? J I I \ greater than T7 J I I \ 



62 



Is confidence 
greater than 



code 1 



72 



No / 

Deny I 
Credt I 



Access Credit 
History 
Information 




Human 
Review/Clerical 

Follow-up 


t ^ 68 





Is Credit 
Rating 
Acceptable? 



.66 




70 FIG. 5 



02/03/2003, EAST version: 1.03.0002 



U.S. Patent Aug. 19,1997 Sheet 6 of 6 5,659,731 





NAME 


ST# 


STREET 
NAME 


CTTY 


STATE 


P.O. 
BO* 


1 


A 


Ignore 


A 


ForZ 


A 


Ignore 


2 




Ignore 


B 


. ForZ 


A 


Ignore 


3 




Ignore 




ForZ 


A 




4 


A 


Ignore 


Z 


ForZ 


A 




5 


B 


Ignore 


A 


ForZ 


A 


Ignore, 


6 


B 




B 


ForZ 


A 




7 


b 


Ignore,. 


F 




A 






B 


Ignore 


z 


ForZ 


A 


Ignore 


9 


ForZ 


Isnore 


A 


ForZ 


A 




10 


ForZ 


loots 


B 


ForZ 


A 


Ignore 


11 


ForZ 


-.l£Q0£&_. 


F 


ForZ 


A 


Ignore 


12 


ForZ 


Ignore 


z 


ForZ 


A 




13 


A 


tanore 


A 


B 


A 


Imore 


u 


A 


..iEDQIfi-. 


B 


B 


A 




15 


A 


Ignore 


F 


B 


A 


Ignore 


i6 


A 


Ignore 


z 


B 


A 


Ignore 


17 


B 


Ignore 


A 


B 


A 


Ignore 


18 


B 


Iimore 


B 


B 


A 


Iimore 


1? 


B 


Imore 


F 


B 


A 




20 


B 


Ignore 


z 


B 


A 


Ignore 


21 


. . F or Z 


Ignore.. 


A 


B 


A 


.tenon- 


22 


ForZ 


Ignore 


B 


B 


A 


Ignore 


23 


For7 


Tmore 


,.„ F 


B 


A 


Ignore 


24 


ForZ 


Ignore 


Z 


B 


A 


Ignore 


25 


A 


Ignore 


A 


A 


A 


T<more 


26 


A 


Isnore 


B 


A 


A 


IgnQK- 


27 


A 


Iimore 


F 


A 


A 


Ignore 


28 


A 


.Jsoore 


z 


A 


A 


Ignore^ 


2? 


B 


Isnore , 


A 


A 


A 


Ignore 


30 


B 


. .IgfiOIfi „. 


B 


A 


A 


-Ignore 


31 


, B 


Ignore 


F 


A 


A 


Ignore 


32 


B 


Ignore 


z 


A 


A 


Ignore 


33 


ForZ 


Igfiors 


A 


A 


A 


Ignore 


34 


ForZ 


Ignore 


B 


A 


A 


IgnooLJ 


35 


ForZ 


Ignore 


F 


A 


A 


Imore | 




ForZ 




Z 


A 


A 





FIG. 6 



02/03/2003, EAST version: 1.03.0002 



5,659,731 

1 2 

METHOD FOR RATING A MATCH FOR A When it is desired to find a match for a given entity within 

GIVEN ENTITY FOUND IN A LIST OF sucn a Ust of business entities, inconsistencies in listing 

ENTITIES information can create matching problems. In some 

instances, inconsistencies can result from erroneous infor- 

FIELD OF THE INVENTION 5 mation stored in the database itself , and also from erroneous 

r™ . . . . * . , jit_. j information input when itentifying a given entity for whom 

l^epresentmveiiuonre^ a match is desked. In other instanced [inconsistencies may 

particularly to a method configured to find a match for a result ^ due t0 styles ( abbreviations) 

given entity in a database containing information on a large used t0 identif y attributes . 

number of entities. ^ ac<m departments typically have procedures for dialing 

BACKGROUND OF THE INVENTION U P < ^ ata ^ ases m & obtaining credit information. Usually, the 

identification process is rather straightforward, and may be 

Systems of the foregoing type are well known. For performed automatically. However, because of the different 

instance, in the credit industry, credit history information on styles of stating names and addresses and the different care 

a given business entity being considered for credit is typi- ^ which is exercised by a large number of people in collecting 

caliy processed through a commercially available database, information, the correlation between a given entity and the 

such as a Dun & Bradstreet database. A user may input the possible matching entities in the database do not always 

name of a business entity into a processor connected to the match precisely. When this occurs, human intervention is 

database, which then locates that given entity in the database often necessary to make the intermediate deterniination as to 

and retrieves its credit history information. The credit his- ^ which one of the one or more identified entities matches the 

tory information is then used to make a decision on whether given entity, before the ultimate determination of whether to 

to grant or withhold credit for the given entity. grant or withhold credit can be made. Proper intermediate 

To simplify matters with a simple example, assume that identification is particularly important in large dollar trans- 

the user has an interest in making a sale on credit to XYZ actions. The human intervention usually involves either 

Corp., which is located at a particular address in a particular ^ making an on-the-spot judgment as to the correct match, or 

city. XYZ Corp. is the "given entity" or "given entry.** After making follow-up phone calls to investigate or verify the 

the user inputs this identifying information, the database is given entity. 

searched and an entry for XYZ Corp. located at a different Based on the amount of time required to verify the 

address in the same city is identified from the database. A identity of a given entity, and the cost associated with the 

determination must then be made as to whether the identified 3Q human (e.g., credit manager, clerk, etc.) who makes those 

XYZ Corp. is the same as the given entity XYZ Corp. If the decisions, it will be found that this somewhat mundane step 

determination is that they are the same, then the credit in the credit approval procedure can cons ume a significant 

information from the database for the identified XYZ Corp. amount of dollar resources. Indeed, in situations where a 

is used in making the credit decision for the transaction with large number of such credit decisions are made, it is found 

the given entity. 35 to be commercially feasible to isolate a subset of justifiable 

Database systems such as these have far reaching appli- risks (Le., those where a reliable match is made), and grant 

cations beyond credit industry applications as illustrated credit to those risks without the need for human intervention, 

above. In another illustration, a wholesale distribution entity There are generally available processes and procedures, 

may periodically distribute product information documents and commercially available software packages for determin- 

to retail entities. The costs associated with these documents 40 ing a "best fif * match for any given entity within a large 

may range from inexpensive product brochures (e.g., 50 compilation or list of entities. For example, a system known 

cents each) to relatively costly product catalogs (e.g., $5.00 as Soundex is well known and has long been used to find 

each). In order to save costs, since thousands of these words that sound similar but are spelled differently, 

product information documents may be distributed, the Similarly, a system known as AdMatch was used to help 

wholesale distribution entity may wish to direct the more 45 people find the proper 1970 census tract, using a base 

expensive catalogs to those retailers having a high sales address. 

volume, and the less expensive brochures to retailers having in the credit industry, systems like the foregoing are used 

a low volume of sales. In this application, the database by credit reporting agencies to identify a list of possible 

system would be accessed to identify sales information on matching entities and numerically score the match of the 

certain entities, as opposed to credit history information. 50 identifying attributes (name, address, city, etc.) for each 

As will become apparent from the discussion that follows, entity identified. More particularly, automated matching 

the present invention is useful in broad-ranging applications, systems are available, which parse, normalize, and further 

including both of the foregoing illustrations. In order to process a given entry to identify likely matches. These 

better explain the concepts and teachings on the present systems can also provide attribute-by-attribute information, 

invention, however, the illustrations provided hereinafter 55 such as a numerical score, reflecting the reliability of the 

will generally focus on the credit industry application pre- match of each attribute. Thus, a user might be faced with an 

sented above. attempted match where the name matches exactly and thus 

Business entities are typically listed in a database by what has a 100% score, the street address has a 63% score, the 

can be called attributes. The most common attributes are town 79%, and the phone number a no entry condition. But, 

those which identify the entity, such as the business name 60 again, human intervention is usually required as a credit 

and location. Location can be broken down into a number of manager, clerk, or other appropriate person must examine 

attributes which include street number, street name, P.O. box the entries, the scores, and the overall context of the request 

number, city, town or the like, state (if in the U.S.) or in order to determine whether the information provided by 

country, and telephone number. These are common the credit database indeed matches the characteristics of the 

attributes which are found in many commercial databases 65 given entity. 

reporting information on business entities. Other attributes More sophisticated systems are known, wherein the indi- 

are, however, sometimes utilized. vidua! attribute scores are weighted by factors based on 
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empirical data to produce a composite score. These systems denial process by automatically granting or denying credit, 

have been less than effective in the past, and it is typically based upon the confidence code associated with a particular 

found that programmers are continuously adjusting weight- match. In this regard, a threshold confidence code is used to 

ing factors to accommodate new conditions. As additional make the intermediate determination as to which one of a 

empirical data is collected, the weighting algorithm be 5 plurality of identified entities matches the given entity. The 

further refined. Thus, it can be appreciated that the weighting threshold value is preferably configured to vary depending 

function or algorithm is a ever-changing device. upon the type and size of the requested transaction. In small 

Unfortunately, while the newly adjusted weighting factors dollar transactions, for example, the threshold confidence 

may accommodate a new condition successfully, they often code is smaller than in large dollar transactions. 

unexpectedly and adversely affect other computations, and 10 

accurate matching problems persist BRIEF DESCRIPTION OF THE DRAWINGS 

It will be seen that a significant cost savings can be The accompanying drawings incorporated in and forming 

achieved by further automating and improving the credit a part of the specification, illustrate several aspects of the 

approval process, thereby reducing or eliminating the need present invention, and together with the description serves to 

for a human to become personally involved. 15 eX piai n the principles of the invention. In the drawings: 

SUMMARY OF THE INVENTION * * s a fl° WCflart showing the principal steps executed 

by a system searching a database for a given entry, and 

In view of the foregoing, it is a general aim of the present assigning match scores to each identifying attribute; 

invention to irmumize the amount of human mtervention mQ % . g a flow ^ mustrating me primary 

31? m a &Wen a large list of 20 ^ rf ^ ^ of L present invention; 

eil An^her object of the present invention is to further FIG. 3 shows a table illustrating a match scoring example; 

streamline the credit granting process by automatically 4 is a Mag™ conceptually illustrating the match 

providing a confidence indicator for the overall match 6T*k kev memory addressing of the preferred embodiment 

between a given entity and a selected entity from the 25 of me present invention; 

database. FIG. 5 is a flowchart illustrating the logic flow of the 

In mat regard, it is an object of the present invention to P resent invention in a credit ^granting/denying application; 
translate individual attribute matching scores into a com- 

posite score, and to generate confidence indicator therefor. ^ FIG. 6 is a table illustrating assumptions made in initially 

Overall, an objective of the invention is to preserve the deriving and storing confidence codes and matching per- 

limited resource and expense of human judgment in granting centages in the preferred match embodiment of the present 

credit to those situations where judgment is required, and to invention. 

if en ^ P T We f^T f0 u Mto Tf^°^K Where DETAILED DESCRIPTION OF THE 

fte automatic matching system has a stabstrcally high con- 35 PREFERRED EMBODIMENTS 

fiaence level in the accuracy. 

To achieve the foregoing and other objects, the present As previously described, credit reporting agencies main- 
invention is generally directed to a method for utilizing and ^ computerized databases of business entities, which are 
evaluating information, automatically and without human stored and referenced by identifying attributes. The most 
intervention, to select a given entity from within an exten- 40 common attributes describe the name, address, and tele- 
sive database containing a large number of entities. The phone number of a particular entity. In the present invention, 
present invention is intended to operate in conjunction with ft * s preferred to use the entity name, street number, street 
systems, wherein each entity stored in the database is name, RO. Box number, city and/or postal code, state (if in 
identified by a plurality of attributes, such as name, address, me United States) or country, and telephone number. Of 
telephone number, etc., and the system operates to match the 45 course, other attributes can be included, such as state of 
attributes of a given entity with the attributes of entities incorporation, approximate number of employees, manufac- 
stored within the database in order to indicate the identity of turing or service organization codes using a generally 
closely matching entities. In addition the system provides accepted standard coding format, and the like, 
numerical scores for each attribute, indicating the quality or When searching for a given entity in a database of stored 
accuracy of the match for each of the attributes. The method 50 entities, identification inconsistencies often arise. Although 
of the present invention assigns a grade to each score of a each attribute will normally have information entered, there 
plurality (n) of the attributes, with the grade being selected are instances where no information will be assigned to a 
from a small number of possible grades, distinguishing stored attribute. In other instances, the user may have only 
between at least a clear match, a clear mismatch, and a part of the necessary information, such as the corporate 
possible match. Thereafter, the grades are assembled for the 5s same and corporate postal address, but no information on 
n attributes to produce a key for each particular entity, telephone number. In yet other instances the telephone 
identified as a closely matching entity. The method then number entered may correspond to a particular direct dial, 
addresses a memory with each key to produce a match internal number of the given entity, rather than the general 
indicator, or confidence code, that reflects the overall quality number of the entity, and therefore the telephone number 
of the match for the particular entity. The matching indica- 60 attribute does not match. Another area where irregularities 
tors stored within the memory are based on empirical can occur is in street names, for example, the choice between 
information for the same or sirnilar grade keys derived from "Road" or "Rd. M in the street address. Hie degree of for- 
tested matches for the same or similar key. mality employed in recording the corporate name also 

It is an important feature of the present invention to causes problems (e.g., "Co " versus "Corp,")* 

substantially reduce the need for human intervention and the 65 Having described the typical environment of the present 

exercise of human judgment. In one application, the present invention, reference will now be made to the figures, 

invention facilitates the automation of the credit granting/ wherein FIG. 1 is a flowchart illustrating the principal steps 
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in achieving the broad concepts of the present invention. all entries maintained on the database. Utilizing consistent 

Namely, taking a plurality of attributes that describe a preprocessing routines thus facilitates the generation of a 

business entity, searching through a database of business more precise measure of matches, even in the case where the 

entities, identifying one or more likely matching entities, data is input with the variety of styles and abbreviations, 

and deriving a confidence indicator representing the prob- 5 which can be expected when identifying the name or loca- 

ability of a proper match, for each identified entity. tion of a particular entity. 

The listing of preferred attributes are presented in the ^% next ^ S *P f to «"* ^database to develop 

bicckde^ Lrrssr ways to deal with the issue of 

that additional attributes (as previously mentioned), fewer , . , ^ ^ *; ^ , ^ 

* A-tc <■ ■ « * . a a 1 • m searching a large listing of entries to locate a match for a 

attributes, or different attributes may be used and yet main- ™ * T * to JL a 'Wctw' h . 

tain the highly effective matching ratio achieved by the ^? r**' t . ' _ ~T er ^f 11 „ 

present mvfntion described herein 8 It may be deterriS T*"*** ^atchkey searches are well 

for example, that the matching scores obtained for the cit£ ^ own ' ™ d f^UZT * Y ^ ? several letters 

street name, and postal code attributes are sufficient to a . of ****** to f od ^ e a ^^f^V- 

t . ' ua ™» Jr" ^ 8UU aunuu^a tut oumwisiii lu matchkey is then compared to like matchkeys formed 

reliably identify a proper match, whereby the state or 15 L ' " 7T fTu . T~ £ ~: 

■J • u ^ * y •* . ./ u a* from the database entries. To illustrate this point, the match- 
country code may be dropped. Likewise, it may be deter- ,r*u * • ^ u * a i_ 
mined that, with an effective parsing art narmaiization M '** ^sent uvenbon may be farmed by. aggregattng 
„„*,•„. t . „ om „ *~\ t f , «.,„k-~ the first five letters from the entity name, the first four letters 
routine, the_ entity name, stteet, and telephone numbers are from me street namei and ^ faitbicc digits from the postal 
enough attnbutes to provide accurate results. In another 7^ ^ TIT j <T T • 
foreseeable embodiment, itmay be desired to substitute an * ™ s aggregation would be formed from the grven 
dternativeattributeforapresenflyeristmgattribute.suchas ^H^S^Z^X^^!^ 
substituting a state of incorporation attribute for the state or en * es ^ TsT to * e 
country atlribute. It is emphasized that the concepts of the ""J*** ^ * ven ^ ™uM be idenUfied as possible 
present invention as described and claimed herein are not ' ^ es * « t - _ . , ( 
tied to any particular attributes, or any particular number of 25 After the search step 14 is completed, the processor 
attributes, but apply to all of the foregoing and other similar * score 01 ^ ? ^ atobute The score is a 
scenarios statistically generated number that reflects the quality of the 

^ . ' . „ .< _ ^ t match between a particular attribute from the given entity 

Before the individual attnbutes of a given entty can be ^ m identifled en% ^ ^ ^ numb ^ doser 

scored, so as to rate the quality or likelihood of a match me matchi with a 100 representmg a precise n^d,. 

between a particular attribute of toe given entity with the ^ of me ^ ven entr ^ ^ successive i y 

conespondmg attribute of an entity in the database, each compared against each attribute of all identified entries to 

attribute entered is preprocessed at step 12. Various tech- ^ a score for ^ attribute of ^ parti cuiar entity. A 

mques are known and have been used in the past for vari ety of algorithms are known that can make this statistical 

preprocessing attnbutes. The particular techniques are pref- comnarison 

erably carried out as a precursor to the method steps of the 35 o * ' i *u * u- * * ^ 

preseminvention.andtoereforeanexhaustivediscussionof "JH^ 2X TSlSSJfiTZ 

toe various known techniques will not be discussed herein. and attribute scoring as illustrated in the 

n foregoing description, and the present invention is not 

The preprocessing step typically begins by parsing the ^ted to any particular method or approach in realizing that 
string of characters into words, which are sometimes ^ reS ult. Indeed, the method of the present invention is con- 
referred to as "tokens." Thereafter, standardization and ^ proce ssing the attribute scores to formulate a 
normalization routines are performed to reduce or eliminate confidence indicator that reflects the overall quality of the 
inconsistencies among abbreviations. In the standardization ma]uil between a given entity and an entity identified from 
process, each parsed word is reviewed and replaced with an a large of entities. To illustrate the broad concepts of 
industry standard equivalent, when appropriate "Street," for 45 me mven tion, reference is now made to FIG. 2. 
example isstantodi^d^ Reference numeral 20 identifies a table listing the 
as St Connecticut or "Con/* are standardized to "CT." attributes preferably used to identify an entity. Dkectly 
The Post Office was toe o^ving force behind standardization acroS sfrom each attribute is a numerical score that has been 
terminology, particularly for addressing conventions. tQ each reflecdng me ^ quality for 

The normalizatioii process is very s imilar to the standard- 50 m at particular attribute. As previously mentioned, the 

ization process, except that it is concerned with converting numerical score is a number between 0 and 100 that repre- 

non industry-standard words to some common form. For sents an accuracy percentage associated with the match, with 

instance, the word 4t manufachiring" (and various forms iqo percent being a perfect match. If no attribute entry is 

thereof) is converted to "mfg" In addition, the normaliza- present, either in the database listing or in the given entry 

tion process may utilize phonetics, wherein it removes 55 m p U t by a user, a null value is inserted in the score column, 

vowels and certain letter groups such as "ing." During the it w fli be appreciated that, in the preferred embodiment, a 

standardization and normalization process, **Noise w words zero score value, representing a very poor or a non-match 

such as "a" and "and" are eliminated, and miscellaneous situation, is quite different than a null value. While in the 

punctuation is handled, either by stripping all punctuation or practice of the present invention a numerical value of zero 

maintaining it in a consistent fashion. Moreover, the nor- ^ could be used to represent a non-entry situation, and a 

malization routine may convert all letters to upper or lower numerical value of one could represent a no-match situation, 

case - it is significant that some distinction is made between a null 

In some embodiments, the preprocessing step 12 can also value (no entry) situation, and a zero value (non-match) 

utilize a lexicon to further assist the standardization and situation. A zero value (non-match) situation will serve to 

normalization of the attributes. It is preferred that the same 65 significantly reduce the likelihood of a match and thus 

routines used in preprocessing the attributes of the given reduce the value of the confidence indicator. In contrast, a 

entry are also utilized during the initial input and storage of null value merely indicates that no information was input (or 



02/03/2003, EAST version: 1.03.0002 



5,659,731 

7 8 

stored), and thus may not significantly reduce the value of chosen solely for the purpose of illustration. If one or more 

the confidence indicator. match grades were added to the presently preferred four (A, 

To better illustrate the scoring profile, reference is briefly B, F, and Z), the match score ranges would necessarily be 

made to FIG. 3, which is a table that presents four specific determined by empirical and statistical data. All 

examples of entity name attribute scoring. The table includes 5 embodiments, however, maintain a match grade to account 

four columns. The first column lists the entity name as stored for a no-entry condition. Likewise, it may be desired to 

in the database, which has previously been standardized and eliminate the no-entry grade of "Z", whereby only grades of 

normalized. The second column lists the entity name, as "A", "B", and "F» would be utilized 

input by a user, and the third column lists this input name After assigning a match grade to each attribute at step 22, 

after it has been standardized and normalized. Thus, the 10 the match grades are assembled into a key at 24. This key 24 

score listed in the fourth column reflects the match com- is used to address a look-up table 26, to retrieve an overall 

pari son between the first and third column entries. match confidence indicator. In the preferred embodiment 

In the first example, the entity name "ABC Manufactur- seven individual attributes are used, and each is assigned one 

ing" was entered by the user, normalized to "ABC MFG" of four match grades. Accordingly, there are 16384 (4 7 ) 

and compared against "ABC MFG CO." Based on statistical 15 possible key 24 combinations. Thus, the look-up table 26 

data, the processor would determine that the names are must have 16,384 address locations, 

likely one and the same, and therefore assign the attribute an Continuing with the example match grades and key 24 

extremely high matching score (99.5 percent in the illus- presented in FIG. 2, reference will be made to FIG. 4, which 

trated embodiment). The second example (after standard- conceptually illustrates how the key 24 is addressed to the 

ization and normalization) compares "ABC MFG CO w 20 look-up table 26. It can be appreciated that, since there are 

against "ABC Widget MFG." Although both names share four possible grades for each attribute, one quarter of the 

"ABC" and "MFG," the given entity does not have "Widget" table address space maps to entity names having a match 

in the title. Again based on statistical information, the grade of "A." Thus, considering the first attribute grade of 

omission may be simply a result of user error in entry. the key, the look-up table address space is effectively 

Accordingly, the processor assigns a matching score of 73.0 25 reduced to 4,096 entries at 30. Similarly, 1,096 table entries 

percent, reflecting a lesser likelihood that the entries are one (reference numeral 32) correspond to a match key 24 having 

and the same. In the third example, the names "XYZ MFG" a match grade of "A** for both the entity name and street 

and "ABC Widget MFG" are compared, and assigned a very number attributes. Continuing through the match key 24, 

low 34.0 percent matching score, since "MFG" is the only FIG. 4 illustrates how the look-up table size effectively 

common word. 30 decreases by successive multiples of four, as additional 

Finally, the last example illustrates a no entry situation. In match key 24 attribute grades are considered. Ultimately, 
this situation, the user did not enter any entity name infor- onl y one location 34 remains and corresponds to the 
mation. Based on other information (e.g. address and/or particular match key. Thus, one specific table address car- 
telephone number) entered by the user, the processor iden- 35 responds to each possible match key. In the figure, the values 
tified a possible matching database entry having the name °f 8 and 933 are illustrated as being stored in this address 
"ABC Widget MFG." Nevertheless, since there is no location. As illustrated, the grade key of "AABAFAZ" has 
attribute to compare the name to, the null value is carried a confidence code of 8 and a 933 percent likelihood that the 
across to the score column. In the invention's presently identified entry is the same as the given entry input by the 
preferred embodiment, the score would be the same (Le., ^ user * 

null) if the user had entered name information but the FIG. 4 is provided merely for conceptional illustration, 

identified database entry had no name information. That is, and it will be understood that mapping a particular key to a 

when either the given entity attribute or the stored entity specific table location is a rather simple task, which may be 

attribute is a no-entry, the null value will be carried across handled mathematically with very little processing power 

to the score column. However, in an alternative 45 and very little processor time. For example, each of the four 

embodiment, it may be desired to identify and distinguish a match grades may be assigned a numerical value of zero 

null condition for an attribute of an entry stored in the through three. Assume that zero is assigned to "A," one to 

database, from a null condition for an attribute of a given "B," two to "F," and three to "Z." The match key of 

entry. "AABAFAZ" may be treated as the base four number 

Returning now to FIG. 2, the individual attribute scores 50 0010203, which is equal to 275 in base 10. This numerical 

are graded at step 22. Here, the relatively high resolution of value, then, may be used to address the memory look-up 

the matching scare is segmented into a limited grade set tsiblt 26. 

fteferably, the limited grade set includes the possible grades There are a variety of ways, in addition to the foregoing, 

of a clear match ("A"), a possible match ( <f B"), a clear in which the match grade key may be mapped to a look-up 

mismatch ("F"), and a no-entry condition ("Z"). Utilizing 55 table address. Indeed, the match grade letters discussed 

the preferred match scoring approach, a match score of herein are largely for purposes of illustration, and in an 

90-100 percent receives a match grade of "A." A match alternative embodiment the matching scores may be mapped 

score of 50-89 percent receives a match grade of "B " and directly to a numerical grade, for example zero to three, 

a match grade of "F" is assigned to match scores below 50 Accordingly, in practice the grade key may be a seven digit 

percent. A match grade of * l Z*' is assigned to a null score, or ^ string of numbers, instead of letters. The significant aspect 

a no-entry condition. is that a numerical value, whether directly or indirectly 

It may be desired to revise this grading step 22 to reflect obtained, may be readily utilized to directly address a 

additional gradations. For example, it may be desired to memory which stores the look-up table entries. In addition, 

provide greater result resolution by assigning "A" to match the seven attribute grades may be combined in some way to 

scores of 92-100 percent, "B" to match scores of 70-91 65 form a key having fewer than seven digits, 

percent, "C" to match scores of 40-69 percent, and "D" to Continuing with the description of FIG. 2, the look-up 

match scores below 40 percent These match scores are table 26 is an aggregate of memory locations, wherein each 
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location contains two numerical values that represent a look-up table 26 may be revised to accuracy and resolution 

quality of the match between the given entity and an of the confidence codes stored therein. In this regard, it may 

identified entity: a confidence code and an accuracy per- be desired to divide a particular match set into several 

centage. The confidence code is a number ranging from 1 to smaller match sets. Of course, as the resolution of the 

10, with 1 representing a low confidence in the match quality 5 look-up table 26 is increased, the more tedious the record- 

and a 10 representing a high match quality. The accuracy keeping becomes. 

percentage is a numerical value preferably expressed as a T . _ . " , A „ . lfc 

percental so that a value of lOofor 100 percent, reflects a U *LT^J^ * ™ 

perfect match. These numerical values are derived in part «*odii^ 

from empirical data, and in part from a statistical formula- bv addin S addltional grades- For instance, additional 

^ 0hm 10 grades C, D, and E may be added. Adding further grades 

With a preferred table size of 16384 entries, it will be substantially enlarges the size of the look-up table 26. 

appreciated that the initial calculation of these numbers may Adding one additional grade C, for example, enlarges the 

be a tremendous undertaking. That is, to generate 16384 fronl 16384 (4 ) entries to 78,125 (5 ) entries. In 

numbers to initially fill the table, wherein each number is vet a further embodiment, it may be desired to add, or reduce 

based upon a statistically sufficient sampling of test cases, 15 the number of attributes utilized to generate the match key. 

would require an enormous amount of time. Accordingly, it Adding an additional attribute, for a total of eight attributes, 

is preferred to use some means of simplifying the process. would increase the look-up table size from 16,384 entries to 

Based upon experience, it was found with the present 65,536 (4 8 ) entries. Likewise, deleting an attribute, for a 

invention that certain assumptions could be made, without total of six, would decrease the look-up table size to 4,096 

appreciably distorting the derived confidence codes and 20 (4*) entries. 

ma !f h / ,ei r n ^ eS * exai ?Pi e ' J OUIld ^V 116 111311:11 The tradeoff, therefore, is that the look-up table 26 is 

^ade for the state code must bean A for a match to occul increased ^ ^ M &ad ts or attributes are added 

It was also found that the phone number was of very limited t-Tl . . -T^i *f . r t 

utility. Indeed, this component was rairnarily us* only in Certa ^> ™* *? m ^ "? ^Ilf" ™ 

tie-breaking situauonsTfor example where one or more 25 Presendy available, those elements do not restrict some 

entities were identified as possible matching entities with £ owth ^table .««. over the preferred 16384 entry size, 

substantially equal matching scores However, there is a substantial cost in determining the 

Another assumption that was made to simplify the process empirical data that is initially entered into the table, and with 
was to ignore street number and P.O. Box numbercompo- Ae maintenance of that data as further empirical information 
nents. Because entities often have multiple buildings, or 30 is Stained. On the one hand, increasing the table size may 
multiple departments within a single building these compo- be ^sired to further increase the accuracy or resolution of 
nents often fail to match even thought he entity is a proper me results - On the other hand, the user is ultimately inter- 
match. The primary assumption, however, was to assume a ested onl y m whether or not the given entry and identified 
state component match grade of "A." It was found that the entrv a likely match, and the cost associated with 
state coinponentriways matched^ 35 expanding the table size becomes prohibitive. 
FIG. 6 is a table illustrating the various assumptions that It was previously described that certain systems are 
were made in simplifying the table size for computing the known that use formulaic approaches, whereby a math- 
initial confidence codes and match percentages. As shown in ematical equation is derived for transforming the match 
the figure, match grades of "F" and "Z n were often grouped. scores into a single numerical value. Under this approach, 
Ultimately based on the assumptions, the 16384 entry table ^ empirical data is used to derive weighting factors that form 
was simplified down to 36 families of possible component part of the equation. When new conditions arise, or new 
match combinations. empirical data gathered, that affects a change in the equation, 

After simplifying the table to the 36 families shown in me equation is then often fails to deal as effectively with 

FIG. 6, over 4,500 test entities were assessed with a match previous conditions. 

grade, and then manually verified. The confidence code and 45 The particular approach of the present invention of uti- 
match percentage values, then, derived from the number of lizing a look-up table 26, which has discrete locations and 
correct and incorrect manual verifications, from the over definite values far certain match grade keys and thus match 
4,500 entity sampling. The values corresponding to a par- scores, offers significant advantages over such formulaic or 
ticular family were written into the appropriate locations equation driven approaches. Continuing the earlier 
within the table, so as to fill the entire 16384 entries. 50 illustration, wherein the match key "AABAFAZ" mapped to 
While the foregoing process was undertaken to generate the memory location of the look-up table containing the 
an initial table of values, it will be appreciated that the confidence code of 8 and accuracy percentage value of 93 J 
maintenance of the table will be an ongoing process. As percent: suppose that as additional empirical data is col- 
confidence codes and match percentages for a particular lected it is deterrnined that match keys of "AABAFAZ" are 
table entry, or group or entries , is found to be inaccurate, the 55 in fact accurate 95.6 percent of the time. Then, the numerical 
value may be updated. In this regard, it may be desired to value in that single table location may be changed, without 
maintain empirical data to update table entries. Accordingly, affecting the results obtained from any other match key. In 
it is understood that, over time, the resolution of the table this way, the values of the look-up table 26 may be updated 
will migrate from the initial resolution of 36 families of as further empirical data is gathered. It can be appreciated 
values, to a much finer and even more accurate resolution of 60 mat tiie fonnulaic approach does not lend itself to such ready 
values. It should also be understood that the foregoing adjustments. 

process was undertaken solely to generate an initial set of The ultimate decision to grant or deny credit for a given 

confidence code and match percentage values, and should transaction may ultimately be based upon a combination of 

not be read as a limitation upon the method steps of the factors including the dollar amount sought and the match 

present invention. 65 accuracy percentage. Higher dollar transactions, demand a 

It is contemplated that by keeping records over time of higher accuracy percentage. In this regard it may be desired 

the accurate and false identifications, data stored in the to supplement the accuracy percentage value with a simple 
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numerical value ranging, for example, from one to ten, to In those instances where credit history information is 

serve as a confidence indicator. A value of one would accessed (step 64), the system evaluates the credit rating, 

correspond to low percentage values of the value obtained obtained from the retrieved information, in connection with 

from the look-up table, and thus represent a low confidence the amount of credit desired (step 68) to determine whether 

level in the match, whereas a value of ten would correspond 5 to grant credit (step 70) or deny credit (step 72). 

to high percentage values of the value from the look-up table The above-example is provided to illustrate just one way 

and represent a high confidence level. that the confidence code and/or accuracy percentage may be 

It is contemplated that the system utilizing the present used in automating a particular transaction. It is understood 

invention will utilize the confidence indicator in providing that similar uses of the confidence code may be employed in 

automation. For example, for low dollar transactions a Iowa 10 systems having other applications as well(Le., non-credit 

confidence level may be utilized to automatically grant the industry applications). 

credit request. In higher dollar transactions the threshold Another significant advantage of the preferred system 

value of the confidence indicator is increased. To better resides in the accommodation of no-entry conditions. When 

illustrate this point, reference is made to FIG. 5, which is a a user ^ searching for a given entity with a relatively 

flowchart showing the logic flow for the present invention, 15 common name, such as "ABC Tire Co", "Your Tire Store", 

when employed in a credit granting/denying application. »ABC Locksmith", or "Just Tires," many entities will likely 

The process begins at step 50, with a user inputting the be identified as possible matching entities, and therefore it 

appropriate attributes for a given entity, for which credit is will be necessary to more fully enter identifying information 

desired. The system then searches a database of business so that a good match may be achieved. Experienced users, 

entities to retrieve a listing of possible (or likely) matching 20 however, may recognize that when the entity name is rather 

entities, assigns a numerical score to each attribute, assigns unique, accurate results may be obtained even when merely 

a grade to each attribute score, and forms a grade key (step entering the entity name. Relatedly, users may sometimes 

52). Utilizing the grade key, a memory look-up table is enter less that all attributes for certain types of transactions, 

addressed to retrieve both a confidence code and an accuracy For example, in a credit checking situation, when the dollar 

percentage (step 54). As mentioned above, the present 25 amount sought is relatively small, users may input less that 

invention then utilizes the confidence code, in connection all the attributes. In large dollar transactions, however, it will 

with the dollar value of the credit transaction sought and the be desired to enter all the attributes to better ensure a higher 

retrieved credit history information, to determine whether to confidence indicator. 

grant or deny the credit request. ^ ^ previously mentioned, it may be desired to use addi- 
More specifically, step 54 is executed for each identified tional and/or alternative attributes to identify entities, 
entity. The entity as having the highest accuracy percentage Similarly, in an alternative form of the present invention, 
and confidence code is then "selected" as the matching geographical population density data may be utilized in 
entity, and the system then assesses a "risk value" (i.e., generating the confidence code or accuracy percentage. For 
small, medium, or large) to that entity (step 55); the "risk 35 example, if the city identified in a location attribute corre- 
value" being based in part on the amount of credit sought sponds to a city having a high population density (e.g., New 
Based upon the risk value, a threshold value for the confi- York, N.Y.), the resulting accuracy percentage and confi- 
dence code is used to determine whether to access and dence code may be smaller than it would be for a lesser 
retrieve credit history information (step 64) or proceed with populated city (e.g., Charleston, S.C.), assuming all other 
human review or clerical follow-up (step 66). ^ attributes and scores remain the same. That is, the confi- 
In very small dollar transactions, the cost of the credit data dence code for "ABC Tire Store" in New York City will 
is not justified, and the whole match issue and subsequent lower man me accuracy percentage and confidence 
retrieval of credit information may be avoided. In small code for "ABC Tire Stare" in Charleston, 
dollar transactions, the cost associated with the review of It will be appreciated that the concepts and teachings of 
match candidates is often a relevant cost component of the 45 the present invention as described above apply equally to 
overall cost of the credit information, and therefore prefer- systems that invoke a direct on-line connection with the 
ably reduced (for example, by automating the process). As database, as well as systems that submit batch jobs to the 
the dollar amount grows, the justification for the expense of database. That is, a user seeking to verify a small number of 
accessing credit history information increases. However, as entities may dial-up and connect to a database, establishing 
the confidence level of a particular match decreases, the 50 a direct, on-line connection. Thereafter, entity attributes may 
expense justification likewise decreases. Accordingly, the be input and searched against the database in real-time, 
risk value and the confidence code are both factors in Alternatively, particularly when a large number of entities 
tetermining whether to access and retrieve credit history are to be searched, the processing may be submitted to the 
information. database as a batch job. In mis regard, a user may append a 
If the risk value is small, for example, then the threshold 55 file containing hundreds, or even thousands, of entity 
value for the confidence must be greater than a predeter- attributes to an overnight batch request, for example. The 
mined value X, in order to retrieve credit history information database would then, one by one, parse the entity attributes, 
(steps 58 and 64). If the risk value is medium, then the identify possible matching entities, look-up the confidence 
threshold value for the confidence code must be greater than codes and match accuracy percentages, select the appropri- 
a predetermined value Y, in order to retrieve credit history 60 ate match* and append the appropriate information and data, 
information (steps 60 and 64). Finally, if the risk value is The foregoing description of various preferred embodi- 
high, then the threshold value for the confidence code must ments of the invention has been presented for purposes of 
be greater than a predetermined value Z, in order to retrieve illustration and description. It is not intended to be exhaus- 
credit history information (steps 62 and 64). Otherwise, the tive or to limit the invention to the precise forms disclosed, 
human intervention or clerical follow-up is required (step 65 Obvious modifications or variations are possible in light of 
66). In the foregoing illustration, the predetermined value X the above teachings. The embodiments discussed were cho- 
is less than Y. which is less than Z. sen and described to provide the best illustration of the 
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principles of the invention and its practical application to 
thereby enable one of ordinary skill in the art to utilize the 
invention in various embodiments and with various modi- 
fications as are suited to the particular use contemplated. All 
such modifications and variations are within the scope of the 
invention as determined by the appended claims when 
interpreted in accordance with the breadth to which they are 
fairly, legally, and equitably entitled. 
What is claimed is: 

1. A method for utilizing and evaluating information, 
automatically and without human intervention, derived from 
a matching system, the matching system being of the type 
which searches an extensive database containing informa- 
tion on a plurality of entities, each entity being identified by 
a plurality of attributes, and the system being of the type 
which matches the attributes of an given entity with the 
attributes of entities stored within the database to indicate 
the identity of closely matching entities along with numeri- 
cal scores for each attribute indicating the quality of the 
match for each of the attributes, the method comprising the 
steps of: 

assigning a grade to the score of a plurality n of the 
attributes, with the grade being selected from a small 
number of possible grades distinguishing between at 
least a clear match, a clear mismatch, and a possible 
match condition; 

assembling the grades for each of the n attributes to 
produce a key for a particular closely matching entity; 
and 

addressing a memory with the key to retrieve a match 
indicator that reflects the overall quality of the match 
for the particular entity, the memory containing match- 
ing indicators based on empirical information for the 
same or similar grade keys. 

2. The method according to claim 1, wherein the matching 35 
indicators stored in the memory are also based on statistical 
formulations. 

3. The method according to claim 1, wherein the grade 
key is an n digit key. 

4. The method according to claim 1, wherein each 
attribute grade is based upon the numerical score associated 
with that attribute. 

5. The method according to claim 1, wherein the small 
number of possible grades further includes a grade for a 
no-entry condition. 

6. The method according to claim 5, wherein a no-entry 
grade is assigned to a no-entry condition in an attribute of 
the given entity. 

7. The method according to claim 5, wherein a no-entry 
grade is assigned to a no-entry condition in an attribute of an 
entry stored in the database. 

8. The method according to claim 1, wherein the small 
number of possible grades further includes a first no-entry 
grade assigned to a no-entry condition in an attribute of the 
given entity and a second no-entry grade assigned to a 
no-entry condition in an attribute of an entry stored in the 
database. 

9. In a computerized system for storing information on a 
large group of business entities and automatically selecting 

a member of the group as a likely match with an given entity, 60 
the system including a database for storing a compilation of 
the large group of entities which are identified by a plurality . 
of attributes, and a processor for accepting information 
specifying the attributes of a given entity and searching the 
compilation to identify possible matches with the listed 65 
entities, the processor being programmed to score the qual- 
ity of a possible match for a plurality n of the attributes of 
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each identified entity, a method for automatically, and with- 
out human intervention, determining the quality of the 
match produced by the processor and comprising the steps 
of: 

assigning a grade to the score of each of the n attributes, 
the grade being selected from a limited grade set 
including distinctive grades for a match, a mismatch, a 
possible match, and a no-entry condition; 

composing a key from the grades for the respective 
attributes; 

addressing a table with the key; and 

extracting information from the table at an address speci- 
fied by the key, the information reflecting the reliability 
of the match based on stored information statistically 
obtained from similar keys for other matches. 

10. In a computerized system for storing information on 
a large group of business entities and selecting a member of 
the group as a likely match with an given entity, the system 
including a database for storing a compilation of the entities 
which are specified by information on a plurality of 
attributes, and a processor for accepting information speci- 
fying attributes of a given entity and searching the compi- 
lation to identify possible matches with the stored entities, 
the processor being programmed to score the quality of a 
possible match for each attribute of an identified entity, a 
method for automatically and without human intervention 
determining the quality of the match produced by the 
processor comprising the steps of: 

selecting a number n of the attributes which will be used 

to grade the reliability of the match; 
assigning a grade to the score associated with each of the 

n attributes, the grades being selected to at least include 

distinct grades for a match, a mismatch, a possible 

match and a no-entry condition; 
assembling a key from the grades assigned to the n 

attributes; 

providing a memory table addressable by all possible 
keys, the table having match reliability data stored 
therein; and 

addressing the table to obtain the match reliability data 
specified by the key. 

11. The method according to claim 10, further including 
the step of analyzing the obtained reliability data to provide 
a confidence level for the key. 

12. A method of processing commercial transactions 
comprising the steps of: 

accessing a commercial database including a list of a large 
number of business entities with associated business 
data, the business entities being identified by a plurality 
of attributes; 

automatically searching within that database to identify 
entities that possibly match a given entity specified by 
a plurality of attributes; 

obtaining numerical scores reflecting the quality of the 
match of each attribute; 

transforming the numerical score for each attribute to a 
grade selected from a limited subset of grades, the 
subset of grades reflecting a likely match, no match, 
and possible match conditions; 

using the grades for the possible match to farm a key, for 
each identified entity; 

utilizing the key to address a memory table to retrieve 
reliability information stored in the memory, the reli- 
ability information being derived from empirical infor- 
mation; and 
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using the reliability information to select a matching automatically reach a decision for processing the commer- 

entity from the list of identified entities. cial transaction. 

13. TOerne^odac^ ^ Xhe method according to claijn 14 , wherein the 
of grades further includes a grade reflecting a no-entry . , _ . 

condition 5 commercia ^ transaction is a credit granting/denying trans- 

14. The method according to claim 12, further including action the associated business data is a credit rating, 
the step of using the reliability Mormation in connection 

with the business data associated with the selected entity to * * * * • * 
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